12 research outputs found
Pose-Invariant 3D Face Alignment
Face alignment aims to estimate the locations of a set of landmarks for a
given image. This problem has received much attention as evidenced by the
recent advancement in both the methodology and performance. However, most of
the existing works neither explicitly handle face images with arbitrary poses,
nor perform large-scale experiments on non-frontal and profile face images. In
order to address these limitations, this paper proposes a novel face alignment
algorithm that estimates both 2D and 3D landmarks and their 2D visibilities for
a face image with an arbitrary pose. By integrating a 3D deformable model, a
cascaded coupled-regressor approach is designed to estimate both the camera
projection matrix and the 3D landmarks. Furthermore, the 3D model also allows
us to automatically estimate the 2D landmark visibilities via surface normals.
We gather a substantially larger collection of all-pose face images to evaluate
our algorithm and demonstrate superior performances than the state-of-the-art
methods
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality
Social presence, the feeling of being there with a real person, will fuel the
next generation of communication systems driven by digital humans in virtual
reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny
effect rely on person-specific (PS) models. However, these PS models are
time-consuming to build and are typically trained with limited data
variability, which results in poor generalization and robustness. Major sources
of variability that affects the accuracy of facial expression transfer
algorithms include using different VR headsets (e.g., camera configuration,
slop of the headset), facial appearance changes over time (e.g., beard,
make-up), and environmental factors (e.g., lighting, backgrounds). This is a
major drawback for the scalability of these models in VR. This paper makes
progress in overcoming these limitations by proposing an end-to-end
multi-identity architecture (MIA) trained with specialized augmentation
strategies. MIA drives the shape component of the avatar from three cameras in
the VR headset (two eyes, one mouth), in untrained subjects, using minimal
personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS
texture decoder is available, MIA is able to drive the full avatar
(shape+texture) robustly outperforming PS models in challenging scenarios. Our
key contribution to improve robustness and generalization, is that our method
implicitly decouples, in an unsupervised manner, the facial expression from
nuisance factors (e.g., headset, environment, facial appearance). We
demonstrate the superior performance and robustness of the proposed method
versus state-of-the-art PS approaches in a variety of experiments